17 research outputs found
Mixture of Expert/Imitator Networks: Scalable Semi-supervised Learning Framework
The current success of deep neural networks (DNNs) in an increasingly broad
range of tasks involving artificial intelligence strongly depends on the
quality and quantity of labeled training data. In general, the scarcity of
labeled data, which is often observed in many natural language processing
tasks, is one of the most important issues to be addressed. Semi-supervised
learning (SSL) is a promising approach to overcoming this issue by
incorporating a large amount of unlabeled data. In this paper, we propose a
novel scalable method of SSL for text classification tasks. The unique property
of our method, Mixture of Expert/Imitator Networks, is that imitator networks
learn to "imitate" the estimated label distribution of the expert network over
the unlabeled data, which potentially contributes a set of features for the
classification. Our experiments demonstrate that the proposed method
consistently improves the performance of several types of baseline DNNs. We
also demonstrate that our method has the more data, better performance property
with promising scalability to the amount of unlabeled data.Comment: Accepted by AAAI 201
Lessons on Parameter Sharing across Layers in Transformers
We propose a parameter sharing method for Transformers (Vaswani et al.,
2017). The proposed approach relaxes a widely used technique, which shares
parameters for one layer with all layers such as Universal Transformers
(Dehghani et al., 2019), to increase the efficiency in the computational time.
We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign
parameters to each layer. Experimental results show that the proposed
strategies are efficient in the parameter size and computational time.
Moreover, we indicate that the proposed strategies are also effective in the
configuration where we use many training data such as the recent WMT
competition
ニューラルネットワークを用いた自然言語処理のための大規模半教師あり学習
Tohoku University乾健太郎課